Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Text keyword extraction method based on word frequency statistics
LUO Yan, ZHAO Shuliang, LI Xiaochao, HAN Yuhui, DING Yafei
Journal of Computer Applications    2016, 36 (3): 718-725.   DOI: 10.11772/j.issn.1001-9081.2016.03.718
Abstract1277)      PDF (1022KB)(962)       Save
Focused on low efficiency and poor accuracy of the traditional TF-IDF (Term Frequency-Inverse Document Frequency) algorithm in keyword extraction, a text keyword extraction method based on word frequency statistics was proposed. Firstly, the formula of the same frequency words in text was deduced according to Zipf's law; secondly, the proportion of each frequency word in text was determined in accordance with the formula of the same frequency words, most of which were low-frequency words; finally, the TF-IDF algorithm based on word frequency statistics was proposed by applying the word frequency statistics law to keyword extraction. Simulation experiments were conducted on Chinese and English text experiment data sets. The average relative error of the formula of the same frequency words was not more than 0.05; the maximum absolute error of the proportion of each frequency word in text was 0.04. Compared with the traditional TF-IDF algorithm, the average precision, the average recall and the average F1-measure of the TF-IDF algorithm based on word frequency statistics were increased respectively, while the average runtime was decreased. The simulation results show that in text keyword extraction, the TF-IDF algorithm based on word frequency statistics is superior to the traditional TF-IDF algorithm in precision, recall and F1-measure, and it can effectively reduce the runtime in keyword extraction.
Reference | Related Articles | Metrics